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Abstract. In this paper we propose a global convex approach for image 
hallucination. Altering the idea of classical multi image super resolution 
(SU) systems to single image SU, we incorporate aligned images to hal- 
lucinate the output. Our work is based on the paper of Tappen et al.|14] 
f^ where they use a non-convex model for image hallucination. In compar- 

fN| ison we formulate a convex primal optimization problem and derive a 

t I fast converging primal-dual algorithm with a global optimal solution. We 

^ I use a database with face images to incorporate high-frequency details to 

■^C the high-resolution output. We show that we can achieve state-of-the-art 

results by using a convex approach. 

,__, 1 Introduction 

> 

r ) Single image Super Resolution (SU) systems yield to estimate a high-resolution 

(HR) image from low-resolution (LR) input. This is clearly an ill-posed problem 
fl due to the fact that important high frequency information is lost in a down- 

'— ^ sampling process. 

, A common constraint to nearly all SU systems is the reconstruction con- 

^ straint, which says that the HR result down-sampled should be the same as the 

(T^ LR input. However, this constraint is weak and the space of possible solutions 

•/"J is large. A generic smoothness prior, like the Total Variation (TV), can improve 

this constraint but no lost information is infered. 

More advanced systems model edge statistics which can produce HR images 
\l with sharp edges while leaving other regions smooth [12]. This approach has its 

^-^ advantages in creating sharp edges with minimal jaggy or squary artifacts. But 

I their performance will decrease as the resolution of the input decreases because 

• • the perceptual important edges will vanish. Additionally such systems cannot 

. 5^ introduce novel high frequency details which were lost in the down-sampling 

S^ process. 

^ Backer and Kanade have shown in ^l^ that systems which only rely on the 

reconstruction constraint (and possible altered with a smoothness prior) cannot 
create high frequency image content. They propose the technique of image hal- 
lucination where HR image details and there LR correspondences are learned on 
a patch basis to synthesise HR images. Such systems like [S] can introduce new 
details which are not present in the low resolution image. However, the patch- 
selection process remains a key problem in such systems and the mathematical 
models make it difficult to control artefacts in the output. A state-of-the-art 
enhancement of such a system is the work of Sun et al. |13| which incorporates 
their so called textual context bridging the gap between image hallucination 
and texture synthesis. 

While these systems perform well on general images, domain based SU sys- 
tem where the content of the image is known have shown improvements of the 
results again. An example of such an approach is the work of Liu et al. face 
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Figure 1: System-Overview: We use SiftFlow[TU| to find similar appearing im- 
ages based on the alignment energy. These Canididates are warped to match 
the input. The LR input and the aligned candidates are incorporated to form 
the estimate. 

hallucination [S]. Their system inferes regularities of face-appearances to hallu- 
cinate details that a general image model can't create. However, the system of 
[S] is limited to frontal face images and can't handle large pose and viewpoint 
changes. 

Our work is based on the paper of Tappen el al. 14J. This approach uses 
aligned face images prior to the hallucination process and therefore incorpo- 
rates the ideas of the classic multi-image SU-systems. Tappen et al. uses Patch 
Match P] to quickly search for similar face images in a large database. The best 
matches are called candidates. These candidates are densely aligned using the 
SIFTflow algorithm^lO . Their system incorporates a edge focusing image prior, 
a global likelihood function (the reconstruction constraint) and an example- 
based non-convex hallucination model within a Baysian framework. Tappen et 
al. pointed out that if their system can't find good candidates, the performance 
decreases fast and the results get blurry. These limitations could be compen- 
sated by falling back to an edge-based system. Our system tries to improve this 
behaviour by using a hallucination model more robust to outliers. 

We proposes a similar work fiow, but in comparison all our models are convex 
and we omit PatchMatch. We search the database with SiftFlow utilizing the 
SiftFlow energy and warp the search results with the same algorithm. We omit 
PatchMatch because we think it is more important to have good aligned candi- 
dates rather than similar appearing images. Our convex optimization problem 
joins an total variation based image prior, the reconstruction constraint and a 
hallucination model robust to outliers. Starting with the primal minimization 
problem we derive a generic saddle-point problem and solve it with a fast con- 
verging primal-dual algorithm proposed by ChamboUe et al. 5t. Figure [l] shows 
our system overview. 

This paper is organized as followed. After presenting a convex approach for 
image hallucination in section [2] we derive a generic saddle-point problem and 
solve it with a so called primal-dual algorithm in section Jsj In section [4] we 
describe the experiments made and we conclude in section [5) 



2 A Convex Approach for Image Hallucination 

As pointed out in the introduction, our work alters the model of Tappen et 
al. I14j to a convex approach. Solving a convex minimization problem has 
nice advantages. Convexity guarantees an existing, unique solution[l] and fast 
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convergence can be achieved. However, choosing the image models and energy 
minimization terms are crucial for a perceptual good solution. The minimization 
problem of (fl]) combines 3 different image models and constraints 

u* = argmin \\Vu\\2,i + \\\DBu - /||^ + 7 V ||i/(M - 5cJ||i, (1) 



where u is our HR estimate, / the LR input and gc are the aligned candidate 
images. 

The first term is a general smoothness prior equipped with the TV-norm. 
This model preserves sharp edges while staying smooth in other regions. The TV 
is defined as TV{u) = J^^ \Vu\dx, where V = '§i^'§~ i^ the gradient operator 
and the || • || is the Ll-norm. 

The second term of (llj models the reconstruction constraint. The constraint 
ensures that the down-sampled HR image yield to the LR input. In other 
words, the HR estimate down-sampled should be the same as the input. The 
matrices DB composes of a Gaussian blurring or anti-alias filter B and a down- 
sampling matrix D. The reconstruction constraint implies a linear model where 
the observed image / is a linear combination of the undistorted image u added 
by noise / = Au + n. If we model the noise as Gaussian, we end up by equipping 
this term with a quadratic norm minimizing the noise. The factor A controls 
how strong the constraint is imposed. 

The third term of (IT]) represents a non-parametric image model here re- 
ferred as the hallucination term. Having found similar candidate images from 
a database and aligned them to our input, high-frequency details can be intro- 
duced from these candidates. The term minimizes the difference between the 
HR result u and the candidate images gc after applying a high-pass filter H. 
We apply the high-pass filter H to infer only high-frequency information from 
the candidates. Thats because the low- frequency details are still present in the 
LR input and can be omitted. Equipping this function with the Ll-norm makes 
it robust to outliers which is the case if no good candidates where found or if 
the alignment fails. 

Note that there exists a strong relation between the blurring matrix B, the 
high-pass filter H and the scaling factor. In fact the high-pass filter kernel 
equals a all-pass kernel 5 subtracted by the blurring kernel. So we incorporate 
just frequencies we lost in the down-sampling process. Moreover the blurring 
kernel depends on the scaling factor [TS]. We choose the standard derivation for 
the blurring kernel as a = |\/^2 — 1, where £, is the scaUng factor. 

3 Deriving the Primal-Dual Algorithm 

In this section we will derive the first order primal-dual form of (flj) . We will solve 
this generic saddle-point problem with a variational approach, the primal-dual 
algorithm of ChamboUe et al. [5J . 

The goal is to transform the primal minimization problem M into a convex- 
concave saddle-point problem of the type: 

mminaic{Kx,y) +G{x)-F*{y), (2) 

X y 

with a continuous linear operator K, and G{x) and F{x) being convex functions. 
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In a first step one has to apply the Legendre-Fenchel transformation also 
refered as the conjugate of a function|3]. We derive the conjugate of the to- 
tal variation (TV) |JVm||2,i and of the hallucination term 'yj^i ll-^l"" " .9j)IIi 
introducing the dual variables p and r^ respectively. Additionally the primal 
variables Wi is introduced as a lagrange-multipliere of the hallucination term 
leading to: 



min raSiX {Kx,y) + A||Z?_B?i — /II2 + 7 



u.wGX p,r£Y 



11 + "^ {~Hg^,r,) -'5||p||^<i(p), 



G{x) 



-F'iy) 



(3) 



with the structure of K, x and y as: 



K = 



H 
H 

V- 



\ 



J 



Wi 
\WnJ 



fp\ 



VJ 



(4) 



The term <^||p||oo<i(p) denotes the indicator function and ||p||oo the maximum 
norm. 

Note that we don't apply the Legendre-Fenchel transformation to the re- 
construction constraint X\\DBu — fW^. Instead we solve this sub-problem using 
the conjugate-gradient method (CG) [3 in an subroutine. Because the recon- 
struction constraint imply a linear model and thus consists of a linear system 
of equations, it is reasonable to use a fast-converging solver specialized on such 



systems. We refer to 3.2 for further details 



3.1 Algorithm 

We use the first order primal-dual algorithm proposed in [5j, there referred as 
"Algorithm 1". The idea is to perform a gradient ascent/decent step on the un- 
constraint objective function and sequentially reproject the variables according 
to the constraints. The gradient step-size a and r are crucial for convergence 
and have to satisfy tXL^ < 1 with L = \\K\\ the operator norm of K. Within an 
iteration we perform a gradient decent in the primal variable x and a gradient 
ascent in the dual variable y followed by the reprojection utilizing the prox- 
operators. Additionally we perform a linear extrapolation of the dual variable 
based on the current and the previous iterates with 9 = 1. This can be seen as 
an approximate extragradient step and offers fast convergence. 



• Initialization: Let raL'^ < 1, with L = \\K\\, 9 e [0,1], 


(xO,2;0)e 


X xY and set y = y^ 




• Iterations (n > 0): Update a;",y", y" as follows: 




r a;"+i==(/-^T9G)-i(x"-Tii:*j/") 




) y"+i = (/ + aaF*)-i(y"-Hcrii:x"+i) 


(5) 


[ r+' - (y"+' + ^(y"+' - y")) 
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3.2 The prox-operators 

Proximity Operators are a powerful generalization of projection operators. Their 
importance is attached by splitting the subject to be minimized into simpler 
functions that can be handled individually [51. The proximity operator then 
assures to "resolve" the sub-gradient dG of any function G even if G is non- 
smooth. We assume that F and G are simple so that one can compute their 
proximity operator in a closed- form. The operator is defined as: 

x={I + TdG)-\x) = argmin | M_i^ + g{x)\ . (6) 

X t 2t J 

In order to apply the algorithm we have to compute the prox-operator for (/ -|- 
adF*)-^ and (I + rdG)-^. In ^ we see that 

F*{y) = 5p-J2{-H9un) (7) 

i 

and 

G{x)^X\\DBu-f\\l+jJ2\\w,\\,. (8) 

i 

The first term in F* [y) is the indicator function of a convex set and the prox- 
or resolvent operator reduces to a pointwise Euclidean projection onto L^ balls. 
The function {—Hgi, ri) poses an inner product and the prox operator of reduces 
to an affine function. 



y={I + adF*)-\y) ^=^ p= ,/„ „ , , n^h + aHg, (9) 

max(l, llpllaa) 

For ||i«i||i the resolvent operator poses a soft-threshold shrinkage function. The 
prox-operator of the reconstruction constraint poses again a linear problem: 

AnewU = bnew, with Anew = (^ + XtA^A) and bneio = ^T A^ f + U 

if Wi > 
X ~ {I + TdG)~^{x) ^=^ Wi = I Wi + Tcr if Wi < 




else (10) 

^=^ {I + \TA^A){u)^XTA^f + u 

Note that the CG-method expects a symmetric positive definite matrix Anew 
which is clearly the case. We apply the CG with a so called "hot-start" where 
the previous iterate of u is used for initialization. The hot-start initialization 
achieves faster convergence of the CG-method. 

4 Experiments 

In our experiments we use the PubFig83 database presented in |llj . The 
database consists of over 14,000 images of public figures, cropped to include 
just the faces and resized to the identical resolution of 100 x 100 pixels. 

All results are produced in the same manner. First we down-sample the 
input by a factor of 4 using bicubic interpolation, followed by a bicubic up- 
samphng by the same factor. We use the resampled image as an input to the 
SiftFlow [TD] and search for candidates with the least SiftFlow energy. Figure [I] 
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Figure 2: Result of our convex approach with a scahng factor of four. In each 
image group we present the LR input, our estimate and the original HR version. 



demonstrates this process. We just search in the set of pictures from the same 
individual as the input. We imply that the person of the input has already been 
identified by a face-recognition system and pictures of this person are available. 
Having found the best 6 candidate images, we aligned them to our input using 
again Siftflow. With this aligned candidates we run the primal-dual algorithm. 
Note that the input image / of the algorithm is still the bicubic down-sampled 
25 X 25 image, while the candidates gi and the result u* are 100 x 100. On 
the output we calculate the Signal-to-Noise ratio (PSNR) and the SSIM index. 
Figure |2] shows some results of our algorithm. 

In our experiments we discovered that a strong reconstruction constraint is 
needed and therefore the A- value was set to A = 5 • lO'', which has proven a high 
PSNR. The hallucination parameter 7 was set to 7 = 20 so that smoothing by 
the TV-regulatization is still applied. To treat the color images in optimization, 
we did a so called channel- by-channel optimization. A more comprehensive color 
treatment was proposed in fT called vectorial total variation. This advanced 
TV-regulatization should be included in future work. 

We ran our algorithm on all 14,000 images and got a mean PSNR of 24,13dB. 
This result outperforms the work of Tappen et al. which got a mean PSNR of 
24.05dB. Table[I]shows a comparison with difl^erent algorithms and the achieved 
PSNR and SSIM index. The table was partly taken from Tappen et al. and we 
refer to P3] for further information. In figure p] we show a comparison between 
the results of Tappen et al. and our approach. The percetual differences on 
these results are quite low which is not extraordinary because all these examples 
achieve a high PSNR and SSIM compared to the average. 



Algorithm 


PSNR (dB) 


SSIM Index 


VISTA 


23.47 


0.669 


Sun et al. [H] 


23.82 


0.741 


I'appen et al. |14j 


24.05 


0.748 


Our Approach 


24,13 


0.750 



Table 1: Comparison of different algorithms and their achieved PSNR on the 
PubFig83 database 
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Figure 3: Comparison between the estimates of Tappen et al. and our approach. 
The first image of each set shows the LR input. In the second image we see the 
approach of Tappen et al. and the third shows our estimate compared to the 
actual HR image as fouth. 

5 Conclusion 

We presented a convex and global approach for image hallucination. This im- 
plies a fast converging algorithm with a unique solution. By incorporating 
high-frequency information from similar images we get perceptually good so- 
lutions. Especially if the alignment of the candidates image works well, the 
results can be nearly perfect. A crucial part in our system poses the SiftFlow 
algorithm, first because we use it as a searching tool, and second and more 
important we use it for the alignment of the images. If SiftFlow is able to align 
the images, the results are superior to those where the alignment fails. Tracking 
failed alignments and replacing such candidates should achieve improvements 
in future work. 

We think that it is not so important to take images from the same person 
rather than having good alignments. For future work we propose to build a 
bag of visual words taken from face images and to apply the same algorithm 
so that no face-recognition system is needed. Due to the fine modeling of the 
down-sampling, blurring and highpass filter and the robust hallucination model 
our convex approach achieves good performance and state-of-the-art results. 
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